
438 References
Bartlett, P. L. & Mendelson, S. (2002), ‘Rademacher and Gaussian complexities: Risk
bounds and structural results’, Journal of Machine Learning Research 3, 463–482.
Ben-David, S., Cesa-Bianchi, N., Haussler, D. & Long, P. (1995), ‘Characterizations
of learnability for classes of {0,...,n}-valued functions’, Journal of Computer and
System Sciences 50, 74–86.
Ben-David, S., Eiron, N. & Long, P. (2003), ‘On the difficulty of approximately maxi-
mizing agreements’, Journal of Computer and System Sciences 66(3), 496–514.
Ben-David, S. & Litman, A. (1998), ‘Combinatorial variability of vapnik-chervonenkis
classes with applications to sample compression schemes’, Discrete Applied Mathe-
matics 86(1), 3–25.
Ben-David, S., Pal, D., & Shalev-Shwartz, S. (2009), Agnostic online learning, in ‘Con-
ference on Learning Theory (COLT)’.
Ben-David, S. & Simon, H. (2001), ‘Efficient learning of linear perceptrons’, Advances
in Neural Information Processing Systems pp. 189–195.
Bengio, Y. (2009), ‘Learning deep architectures for AI’, Foundations and Trends in
Machine Learning 2(1), 1–127.
Bengio, Y. & LeCun, Y. (2007), ‘Scaling learning algorithms towards ai’, Large-Scale
Kernel Machines 34.
Bertsekas, D. (1999), Nonlinear Programming, Athena Scientific.
Beygelzimer, A., Langford, J. & Ravikumar, P. (2007), ‘Multiclass classification with
filter trees’, Preprint, June .
Birkhoff, G. (1946), ‘Three observations on linear algebra’, Revi. Univ. Nac. Tucuman,
ser A 5, 147–151.
Bishop, C. M. (2006), Pattern recognition and machine learning, Vol. 1, springer New
York.
Blum, L., Shub, M. & Smale, S. (1989), ‘On a theory of computation and complexity
over the real numbers: Np-completeness, recursive functions and universal machines’,
Am. Math. Soc 21(1), 1–46.
Blumer, A., Ehrenfeucht, A., Haussler, D. & Warmuth, M. K. (1987), ‘Occam’s razor’,
Information Processing Letters 24(6), 377–380.
Blumer, A., Ehrenfeucht, A., Haussler, D. & Warmuth, M. K. (1989), ‘Learnability
and the Vapnik-Chervonenkis dimension’, Journal of the Association for Computing
Machinery 36(4), 929–965.
Borwein, J. & Lewis, A. (2006), Convex Analysis and Nonlinear Optimization, Springer.
Boser, B. E., Guyon, I. M. & Vapnik, V. N. (1992), A training algorithm for optimal
margin classifiers, in ‘Conference on Learning Theory (COLT)’, pp. 144–152.
Bottou, L. & Bousquet, O. (2008), The tradeoffs of large scale learning, in ‘NIPS’,
pp. 161–168.
Boucheron, S., Bousquet, O. & Lugosi, G. (2005), ‘Theory of classification: a survey of
recent advances’, ESAIM: Probability and Statistics 9, 323–375.
Bousquet, O. (2002), Concentration Inequalities and Empirical Processes Theory Ap-
plied to the Analysis of Learning Algorithms, PhD thesis, Ecole Polytechnique.
Bousquet, O. & Elisseeff, A. (2002), ‘Stability and generalization’, Journal of Machine
Learning Research 2, 499–526.
Boyd, S. & Vandenberghe, L. (2004), Convex Optimization, Cambridge University
Press.